36 research outputs found
Markov Properties for Graphical Models with Cycles and Latent Variables
We investigate probabilistic graphical models that allow for both cycles and
latent variables. For this we introduce directed graphs with hyperedges
(HEDGes), generalizing and combining both marginalized directed acyclic graphs
(mDAGs) that can model latent (dependent) variables, and directed mixed graphs
(DMGs) that can model cycles. We define and analyse several different Markov
properties that relate the graphical structure of a HEDG with a probability
distribution on a corresponding product space over the set of nodes, for
example factorization properties, structural equations properties,
ordered/local/global Markov properties, and marginal versions of these. The
various Markov properties for HEDGes are in general not equivalent to each
other when cycles or hyperedges are present, in contrast with the simpler case
of directed acyclic graphical (DAG) models (also known as Bayesian networks).
We show how the Markov properties for HEDGes - and thus the corresponding
graphical Markov models - are logically related to each other.Comment: 131 page
Constraint-based Causal Discovery for Non-Linear Structural Causal Models with Cycles and Latent Confounders
We address the problem of causal discovery from data, making use of the
recently proposed causal modeling framework of modular structural causal models
(mSCM) to handle cycles, latent confounders and non-linearities. We introduce
{\sigma}-connection graphs ({\sigma}-CG), a new class of mixed graphs
(containing undirected, bidirected and directed edges) with additional
structure, and extend the concept of {\sigma}-separation, the appropriate
generalization of the well-known notion of d-separation in this setting, to
apply to {\sigma}-CGs. We prove the closedness of {\sigma}-separation under
marginalisation and conditioning and exploit this to implement a test of
{\sigma}-separation on a {\sigma}-CG. This then leads us to the first causal
discovery algorithm that can handle non-linear functional relations, latent
confounders, cyclic causal relationships, and data from different (stochastic)
perfect interventions. As a proof of concept, we show on synthetic data how
well the algorithm recovers features of the causal graph of modular structural
causal models.Comment: Accepted for publication in Conference on Uncertainty in Artificial
Intelligence 201
Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias
We prove the main rules of causal calculus (also called do-calculus) for i/o
structural causal models (ioSCMs), a generalization of a recently proposed
general class of non-/linear structural causal models that allow for cycles,
latent confounders and arbitrary probability distributions. We also generalize
adjustment criteria and formulas from the acyclic setting to the general one
(i.e. ioSCMs). Such criteria then allow to estimate (conditional) causal
effects from observational data that was (partially) gathered under selection
bias and cycles. This generalizes the backdoor criterion, the
selection-backdoor criterion and extensions of these to arbitrary ioSCMs.
Together, our results thus enable causal reasoning in the presence of cycles,
latent confounders and selection bias. Finally, we extend the ID algorithm for
the identification of causal effects to ioSCMs.Comment: Accepted for publication in Conference on Uncertainty in Artificial
Intelligence 2019 (UAI-2019
On the Effectiveness of Hybrid Mutual Information Estimation
Estimating the mutual information from samples from a joint distribution is a
challenging problem in both science and engineering. In this work, we realize a
variational bound that generalizes both discriminative and generative
approaches. Using this bound, we propose a hybrid method to mitigate their
respective shortcomings. Further, we propose Predictive Quantization (PQ): a
simple generative method that can be easily combined with discriminative
estimators for minimal computational overhead. Our propositions yield a tighter
bound on the information thanks to the reduced variance of the estimator. We
test our methods on a challenging task of correlated high-dimensional Gaussian
distributions and a stochastic process involving a system of free particles
subjected to a fixed energy landscape. Empirical results show that hybrid
methods consistently improved mutual information estimates when compared to the
corresponding discriminative counterpart
An Information-theoretic Approach to Distribution Shifts
Safely deploying machine learning models to the real world is often a
challenging process. Models trained with data obtained from a specific
geographic location tend to fail when queried with data obtained elsewhere,
agents trained in a simulation can struggle to adapt when deployed in the real
world or novel environments, and neural networks that are fit to a subset of
the population might carry some selection bias into their decision process. In
this work, we describe the problem of data shift from a novel
information-theoretic perspective by (i) identifying and describing the
different sources of error, (ii) comparing some of the most promising
objectives explored in the recent domain generalization, and fair
classification literature. From our theoretical analysis and empirical
evaluation, we conclude that the model selection procedure needs to be guided
by careful considerations regarding the observed data, the factors used for
correction, and the structure of the data-generating process
Multi-objective optimization via equivariant deep hypervolume approximation
Optimizing multiple competing objectives is a common problem across science
and industry. The inherent inextricable trade-off between those objectives
leads one to the task of exploring their Pareto front. A meaningful quantity
for the purpose of the latter is the hypervolume indicator, which is used in
Bayesian Optimization (BO) and Evolutionary Algorithms (EAs). However, the
computational complexity for the calculation of the hypervolume scales
unfavorably with increasing number of objectives and data points, which
restricts its use in those common multi-objective optimization frameworks. To
overcome these restrictions we propose to approximate the hypervolume function
with a deep neural network, which we call DeepHV. For better sample efficiency
and generalization, we exploit the fact that the hypervolume is
scale-equivariant in each of the objectives as well as permutation invariant
w.r.t. both the objectives and the samples, by using a deep neural network that
is equivariant w.r.t. the combined group of scalings and permutations. We
evaluate our method against exact, and approximate hypervolume methods in terms
of accuracy, computation time, and generalization. We also apply and compare
our methods to state-of-the-art multi-objective BO methods and EAs on a range
of synthetic benchmark test cases. The results show that our methods are
promising for such multi-objective optimization tasks.Comment: Updated with camera-ready version. Accepted at ICLR 202
Improving Fair Predictions Using Variational Inference In Causal Models
The importance of algorithmic fairness grows with the increasing impact
machine learning has on people's lives. Recent work on fairness metrics shows
the need for causal reasoning in fairness constraints. In this work, a
practical method named FairTrade is proposed for creating flexible prediction
models which integrate fairness constraints on sensitive causal paths. The
method uses recent advances in variational inference in order to account for
unobserved confounders. Further, a method outline is proposed which uses the
causal mechanism estimates to audit black box models. Experiments are conducted
on simulated data and on a real dataset in the context of detecting unlawful
social welfare. This research aims to contribute to machine learning techniques
which honour our ethical and legal boundaries
Pruning via Iterative Ranking of Sensitivity Statistics
With the introduction of SNIP [arXiv:1810.02340v2], it has been demonstrated
that modern neural networks can effectively be pruned before training. Yet, its
sensitivity criterion has since been criticized for not propagating training
signal properly or even disconnecting layers. As a remedy, GraSP
[arXiv:2002.07376v1] was introduced, compromising on simplicity. However, in
this work we show that by applying the sensitivity criterion iteratively in
smaller steps - still before training - we can improve its performance without
difficult implementation. As such, we introduce 'SNIP-it'. We then demonstrate
how it can be applied for both structured and unstructured pruning, before
and/or during training, therewith achieving state-of-the-art
sparsity-performance trade-offs. That is, while already providing the
computational benefits of pruning in the training process from the start.
Furthermore, we evaluate our methods on robustness to overfitting,
disconnection and adversarial attacks as well.Comment: 25 pages, 21 figures, 62 pictures, typos corrected, reference adde